Search CORE

80 research outputs found

Assessing record linkage between health care and Vital Statistics databases using deterministic methods

Author: AG Muse
Andrew Fong
B Herrchen
Bing Li
CW Kelman
D Knuth
GR Howe
HB Newcombe
HB Newcombe
Hude Quan
LL Roos
LL Roos Jr
Mingshan Lu
MS Goldberg
PA Van den Brandt
SA Waien
TB Newman
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: We assessed the linkage and correct linkage rate using deterministic record linkage among three commonly used Canadian databases, namely, the population registry, hospital discharge data and Vital Statistics registry. METHODS: Three combinations of four personal identifiers (surname, first name, sex and date of birth) were used to determine the optimal combination. The correct linkage rate was assessed using a unique personal health number available in all three databases. RESULTS: Among the three combinations, the combination of surname, sex, and date of birth had the highest linkage rate of 88.0% and 93.1%, and the second highest correct linkage rate of 96.9% and 98.9% between the population registry and Vital Statistics registry, and between the hospital discharge data and Vital Statistics registry in 2001, respectively. Adding the first name to the combination of the three identifiers above increased correct linkage by less than 1%, but at the cost of lowering the linkage rate almost by 10%. CONCLUSION: Our findings suggest that the combination of surname, sex and date of birth appears to be optimal using deterministic linkage. The linkage and correct linkage rates appear to vary by age and the type of database, but not by sex

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hong Kong University of Science and Technology Institutional Repository

A review for clinical outcomes research: hypothesis generation, data strategy, and hypothesis-driven statistical analysis

Author: A Elixhauser
David C. Chang
DB Rubin
DE Clark
GH Utter
HB Newcombe
JG Ford
LJ Cook
Mark A. Talamini
ME Charlson
PL Tassler
PS Romano
U Guller
Z Mosenifar
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

In recent years, more and more large, population-level databases have become available for clinical research. The size and complexity of these databases often present a methodological challenge for investigators. We propose that a “protocol” may facilitate the research process using these databases. In addition, much like the structured History and Physical (H&P) helps the audience appreciate the details of a patient case more systematically, a formal outcomes research protocol can also help in the systematic evaluation of an outcomes research manuscript

Crossref

Springer - Publisher Connector

PubMed Central

Technical challenges of providing record linkage services for research

Author: A Ferrante
A Ferrante
Adrian P Brown
Anna M Ferrante
C Kelman
D Rosman
DE Clark
DP Jutte
DV Ford
EL Brook
H Newcombe
H Newcombe
HB Newcombe
I Fellegi
Jacqueline K Bauer
James B Semmens
James H Boyd
JH Boyd
L Roos
LE Gill
LL Roos
LL Roos
MA Hernández
OECD
R Pinder
R Schnell
S Gomatam
S Kendrick
S Kendrick
SE Hall
Sean M Randall
SM Randall
SW Kendrick
TH Herzog
WE Winkler
WE Winkler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Record linkage techniques are widely used to enable health researchers to gain event based longitudinal information for entire populations. The task of record linkage is increasingly being undertaken by specialised linkage units (SLUs). In addition to the complexity of undertaking probabilistic record linkage, these units face additional technical challenges in providing record linkage ‘as a service’ for research. The extent of this functionality, and approaches to solving these issues, has had little focus in the record linkage literature. Few, if any, of the record linkage packages or systems currently used by SLUs include the full range of functions required. Methods: This paper identifies and discusses some of the functions that are required or undertaken by SLUs in the provision of record linkage services. These include managing routine, on-going linkage; storing and handling changing data; handling different linkage scenarios; accommodating ever increasing datasets. Automated linkage processes are one way of ensuring consistency of results and scalability of service. Results: Alternative solutions to some of these challenges are presented. By maintaining a full history of links, and storing pairwise information, many of the challenges around handling ‘open’ records, and providing automated managed extractions are solved. A number of these solutions were implemented as part of the development of the National Linkage System (NLS) by the Centre for Data Linkage (part of the Population Health Research Network) in Australia.Conclusions: The demand for, and complexity of, linkage services are growing. This presents as a challenge to SLUs as they seek to service the varying needs of dozens of research projects annually. Linkage units need to be both flexible and scalable to meet this demand. It is hoped the solutions presented here can help mitigate these difficulties

Crossref

Springer - Publisher Connector

espace@Curtin

A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage

Author: CJ Burges
DF Williamson
DG Altman
DG Altman
DG Altman
DP Silveira da
HB Newcombe
IP Fellegi
JH Friedman
L Breiman
LE Raileanu
LR Dice
M Tromp
P Christen
RS Michalski
SJ Press
VI Levenshtein
X Meng
Y Siegert
Publication venue: 19th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK)
Publication date: 03/08/2017
Field of study

Record linkage (RL) is the process of identifying and linking data that relates to the same physical entity across multiple heterogeneous data sources. Deterministic linkage methods rely on the presence of common uniquely identifying attributes across all sources while probabilistic approaches use non-unique attributes and calculates similarity indexes for pair wise comparisons. A key component of record linkage is accuracy assessment — the process of manually verifying and validating matched pairs to further refine linkage parameters and increase its overall effectiveness. This process however is time-consuming and impractical when applied to large administrative data sources where millions of records must be linked. Additionally, it is potentially biased as the gold standard used is often the reviewer’s intuition. In this paper, we present an approach for assessing and refining the accuracy of probabilistic linkage based on different supervised machine learning methods (decision trees, naïve Bayes, logistic regression, random forest, linear support vector machines and gradient boosted trees). We used data sets extracted from huge Brazilian socioeconomic and public health care data sources. These models were evaluated using receiver operating characteristic plots, sensitivity, specificity and positive predictive values collected from a 10-fold cross-validation method. Results show that logistic regression outperforms other classifiers and enables the creation of a generalized, very accurate model to validate linkage results

Crossref

UCL Discovery

Medical record linkage in health information systems by approximate string matching and clustering

Author: A Baxter
A Ben-Dor
AE Monge
AE Monge
AK McCallum
Antoine Buemi
AP Dempster
B Everitt
C Quantin
E Hartuv
EH Porter
Erik A Sauleau
G Navarro
G Navarro
H Kawaji
HB Newcombe
HB Newcombe
I Fellegi
J Hartigan
JA Hylthon
Jean-Philippe Paumier
M Fortini
M Hernandez
M Pavan
MA Jaro
MA Jaro
P Eades
P Sellers
R Baeza-Yates
R Sharan
R Sharan
T Fruchterman
T Kamada
T Vintsyuk
TF Smith
TR Belin
V Levenhstein
W Cohen
WE Winkler
WE Winkler
WE Yancey
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Quality and complexity measures for data linkage and deduplication

Author: C Shearer
Centre for Epidemiology and Research NSW Department of Health
CW Kelman
D Pyle
DP Bertsekas
DS Zingmond
E Rahm
HB Newcombe
I Fellegi
L Gill
MA Hernandez
ME Smith
RA Baeza-Yates
S Gomatam
S Salzberg
T Blakely
TROC Fawcett
WS Cooper
Publication venue: Springer
Publication date: 01/01/2007
Field of study

Summary. Deduplicating one data set or linking several data sets are increasingly important tasks in the data preparation steps of many data mining projects. The aim of such linkages is to match all records relating to the same entity. Research interest in this area has increased in recent years, with techniques originating from statistics, machine learning, information retrieval, and database research being combined and applied to improve the linkage quality, as well as to increase performance and efficiency when linking or deduplicating very large data sets. Different measures have been used to characterise the quality and complexity of data linkage algorithms, and several new metrics have been proposed. An overview of the issues involved in measuring data linkage and deduplication quality and complexity is presented in this chapter. It is shown that measures in the space of record pair comparisons can produce deceptive quality results. Various measures are discussed and recommendations are given on how to assess data linkage and deduplication quality and complexity. Key words: data or record linkage, data integration and matching, deduplication, data mining pre-processing, quality and complexity measures

CiteSeerX

Crossref

Accuracy and completeness of patient pathways – the benefits of national data linkage in Australia

Author: A Ferrante
Adrian P. Brown
Anna M. Ferrante
B Sibthorpe
BA Virnig
C Kelman
CDAJ Holman
G Bishop
G Lawrence
H Newcombe
HB Newcombe
Jacqueline K. Bauer
James B. Semmens
James H. Boyd
K Field
K Harron
K Harron
Katrina Spilsbury
Kevin McInneny
KM Campbell
LE Gill
LL Roos
MA Jaro
Margo Gillies
NCRIS
P Christen
PM Frommer
R Karmel
Sean M. Randall
SW Kendrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background - The technical challenges associated with national data linkage, and the extent of cross-border population movements, are explored as part of a pioneering research project. The project involved linking state-based hospital admission records and death registrations across Australia for a national study of hospital related deaths. Methods - The project linked over 44 million morbidity and mortality records from four Australian states between 1st July 1999 and 31st December 2009 using probabilistic methods. The accuracy of the linkage was measured through a comparison with jurisdictional keys sourced from individual states. The extent of cross-border population movement between these states was also assessed. Results - Data matching identified almost twelve million individuals across the four Australian states. The percentage of individuals from one state with records found in another ranged from 3-5 %. Using jurisdictional keys to measure linkage quality, results indicate a high matching efficiency (F measure 97 to 99 %), with linkage processing taking only a matter of days. Conclusions - The results demonstrate the feasibility and accuracy of undertaking cross jurisdictional linkage for national research. The benefits are substantial, particularly in relation to capturing the full complement of records in patient pathways as a result of cross-border population movements. The project identified a sizeable ‘mobile’ population with hospital records in more than one state. Research studies that focus on a single jurisdiction will under-enumerate the extent of hospital usage by individuals in the population. It is important that researchers understand and are aware of the impact of this missing hospital activity on their studies. The project highlights the need for an efficient and accurate data linkage system to support national research across Australia

Crossref

Springer - Publisher Connector

PubMed Central

espace@Curtin

How good is probabilistic record linkage to reconstruct reproductive histories? Results from the Aberdeen children of the 1950s study

Author: A Coulter
Bianca L DeStavola
CM Coeli
CR Ramsay
D Nitsch
D Whiteman
DA Leon
David A Leon
Dorothea Nitsch
G Howe
GD Batty
HB Newcombe
Heather Clark
HS Shannon
Information and Statistics Division
IP Fellegi
M Fair
M Jaro
MM Adams
R Illsley
S Gomatam
S Harlow
S Kendrick
SMB Morton
Stata Corp
Susan Morton
The West of Scotland Coronary Prevention Study Group
Y Nishiwaki
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Probabilistic record linkage is widely used in epidemiology, but studies of its validity are rare. Our aim was to validate its use to identify births to a cohort of women, being drawn from a large cohort of people born in Scotland in the early 1950s. METHODS: The Children of the 1950s cohort includes 5868 females born in Aberdeen 1950–56 who were in primary schools in the city in 1962. In 2001 a postal questionnaire was sent to the cohort members resident in the UK requesting information on offspring. Probabilistic record linkage (based on surname, maiden name, initials, date of birth and postcode) was used to link the females in the cohort to birth records held by the Scottish Maternity Record System (SMR 2). RESULTS: We attempted to mail a total of 5540 women; 3752 (68%) returned a completed questionnaire. Of these 86% reported having had at least one birth. Linkage to SMR 2 was attempted for 5634 women, one or more maternity records were found for 3743. There were 2604 women who reported at least one birth in the questionnaire and who were linked to one or more SMR 2 records. When judged against the questionnaire information, the linkage correctly identified 4930 births and missed 601 others. These mostly occurred outside of Scotland (147) or prior to full coverage by SMR 2 (454). There were 134 births incorrectly linked to SMR 2. CONCLUSION: Probabilistic record linkage to routine maternity records applied to population-based cohort, using name, date of birth and place of residence, can have high specificity, and as such may be reliably used in epidemiological research

Aberdeen University Research Archive

Crossref

LSHTM Research Online

Springer - Publisher Connector

PubMed Central

A retrospective population-based study of childhood hospital admissions with record linkage to a birth defects registry

Author: A Carnevale
Australian Bureau of Statistics
AW Read
C Bower
Carol Bower
CD Bluestone
CD Holman
Centers for Disease Control and Prevention
CJ Freemantle
DE Knuth
F Al-Yaman
FJ Stanley
GE Copeland
HB Newcombe
J Billett
JG Hall
K Williams
KJ Brameld
L Slack-Smith
Lyn Colvin
ML Casselbrant
NJ Waitzman
PW Newacheck
PW Yoon
R MacFaul
RM Viner
S Leonard
SE McCandless
World Health Organisation
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Using population-based linked records of births, deaths, birth defects and hospital admissions for children born 1980–1999 enables profiles of hospital morbidity to be created for each child. Methods This is an analysis of a state-based registry of birth defects linked to population-based hospital admission data. Transfers and readmissions within one day could be taken into account and treated as one episode of care for the purposes of analyses (N = 485,446 children; 742,845 non-birth admissions). Results Children born in Western Australia from 1980–1999 with a major birth defect comprised 4.6% of live births but 12.0% of non-birth hospital admissions from 1980–2000. On average, the children with a major birth defect remained in hospital longer than the children in the comparison group for the same diagnosis. The mean and median lengths of stay (LOS) for admissions before the age of 5 years have decreased for all children since 1980. However, the mean number of admissions per child admitted has remained constant at around 3.8 admissions for children with a major birth defect and 2.2 admissions for all other children. Conclusion To gain a true picture of the burden of hospital-based morbidity in childhood, admission records need to be linked for each child. We have been able to do this at a population level using birth defect cases ascertained by a birth defects registry. Our results showed a greater mean LOS and mean number of admissions per child admitted than previous studies. The results suggest there may be an opportunity for the children with a major birth defect to be monitored and seen earlier in the primary care setting for common childhood illnesses to avoid hospitalisation or reduce the LOS.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Smc5/6 coordinates formation and resolution of joint molecules with chromosome morphology to ensure meiotic divisions

Author: A Chavez
A De Muyt
A Hochwagen
A Hochwagen
A Irmisch
A McAleenan
A Schwacha
A Sourirajan
A Svetlanov
AC Chan
AJ Wood
AL Marston
Alex D. Herbert
Alice Copsey
Andreas Hochwagen
Andrew Chi-ho Chan
B Rockmill
BH Lee
C Buhler
C Tapia-Alveal
DM Sheedy
E Ampatzidou
EA Andrews
EA Outwin
ER Hoffmann
Eva Hoffmann
F Osman
FZ Watts
G De Piccoli
G Vader
GA Cromie
GR Smith
GV Borner
H Murakami
Hannah G. Blitzblau
HB Lindroos
HG Blitzblau
HG Blitzblau
HG Yu
HG Yu
I Lilienthal
J Gregan
J Matos
J Matos
J Palecek
J Pan
J Torres-Rosell
J Zalevsky
JA Carballo
JA Downs
JK Holloway
JM Murray
JR Mullen
JR Mullen
JS Bickel
K Nishimura
K Zakharyevich
K Zakharyevich
KA Henderson
KP Kim
KP Kohl
KT Nishant
L Jessop
L Jessop
L Newnham
L Wu
L Xu
LE Berchowitz
LE Hang
LJ Gaskell
Louise Newnham
M Bermudez-Lopez
M Xaver
MA Roy
Michael Lichten
MJ Neale
MN Boddy
MS McMahill
N Hunter
N Hunter
N Wu
Neil Hunter
NK Kolas
O Yildiz
P Jordan
Philip W. Jordan
PR Potts
PR Potts
Prakash Arumugam
RK Clyne
S Agarwal
S Chu
S Farmer
S Gray
S Panizza
S Pebernard
S Wehrkamp-Richter
SB Buonomo
SD Oh
SD Oh
Shangming Tang
SL Andersen
Sonya Newcombe
Stephen Gray
T de los Santos
T Snowden
T Wechsler
TM Carlile
TS Kitajima
V Kaliraman
WM Fricke
X Zhao
Y Blat
YH Chen
YH Chen
Zhaobo Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

During meiosis, Structural Maintenance of Chromosome (SMC) complexes underpin two fundamental features of meiosis: homologous recombination and chromosome segregation. While meiotic functions of the cohesin and condensin complexes have been delineated, the role of the third SMC complex, Smc5/6, remains enigmatic. Here we identify specific, essential meiotic functions for the Smc5/6 complex in homologous recombination and the regulation of cohesin. We show that Smc5/6 is enriched at centromeres and cohesin-association sites where it regulates sister-chromatid cohesion and the timely removal of cohesin from chromosomal arms, respectively. Smc5/6 also localizes to recombination hotspots, where it promotes normal formation and resolution of a subset of joint-molecule intermediates. In this regard, Smc5/6 functions independently of the major crossover pathway defined by the MutLγ complex. Furthermore, we show that Smc5/6 is required for stable chromosomal localization of the XPF-family endonuclease, Mus81-Mms4Eme1. Our data suggest that the Smc5/6 complex is required for specific recombination and chromosomal processes throughout meiosis and that in its absence, attempts at cell division with unresolved joint molecules and residual cohesin lead to severe recombination-induced meiotic catastroph

Crossref

Repository@Nottingham

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Sussex Research Online

ScholarBank@NUS

FigShare